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ABSTRACT 


This report summarizes ILLIAC IV multi- 
spectral image processing research conducted during 
the last twelve months by the Center for Advanced 
Computation (CAC) of the University of Illinois in 
collaboration with the Laboratory for Applications 
of Remote Sensing (LARS) of Purdue University. The 
research reported has focused on the design and partial 
implementation of a comprehensive ILLIAC IV software 
system for computer-assisted Interpretation of multi- 
spectral earth resources data such as that now collected 
by the Earth Resources Technology Satellite (ERTS) . 
Research to date suggests generally that the ILLIAC IV 
should be as much as two orders of magnitude more cost- 
effective than serial processing computers for digital 
interpretation of ERTS imagery via multivariate statis- 
ical classification techniques. The potential of the 
ARPA Network as a mechanism for interfacing geographi- 
cally-dispersed users to an ILLIAC IV image processing 
facility is discussed. 
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Introduction 

This report summarizes ILLIAC IV multispectral image processing 
research conducted during the last twelve months by the Center for Advanced 
Computation (CAC) of the University of Illinois in collaboration with the 
Laboratory for Applications of Remote Sensing (LARS) of Purdue University. 

The research reported focuses on the design and partial implementation of 
a comprehensive ILLIAC IV software system for computer-assisted interpre- 
tation of multispectral earth resources data, such as that now collected by 
the Earth Resources Technology Satellite (ERTS) . This work has been under- 
taken in support of the earth resources monitoring objectives of the 
ERTS/EROS programs of NASA and USGS, and in support of the ILLIAC IV 
applications programs of ARPA and NASA at Ames Research Center. 

Since its launch in July 1972, ERTS has demonstrated the practi- 
cality of orbital remote sensing systems as mechanisms for dynamic sur- 
veillance of natural resources and land uses at regional, state, and national 
12 3 

scales. 9 9 From a sun-synchronous polar orbit 570 miles high, ERTS 
routinely telemeters to ground receiving stations high-resolution multi- 
spectral scanner (MSS) images each covering a geographic area one hundred 
nautical miles square. A single ERTS MSS scene consists of 7.6 x 10^ digital 
image resolution elements (2340 scan lines and 3240 samples per line) yielding 
an effective ground resolution of about one acre. With eight bits of data 
recorded for each of four wavelength bands (two visible and two reflective 
infrared), each MSS image represents approximately 240 x 10^ bits of data. 
Since these images are digitized at rates up to one every twenty-five seconds, 
ground processing, interpretation, and meaningful use of the data generated 
pose a number of challenging problems. 
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Computer methods of automatic image interpretation and information 

management offer promising means for reduction of the enormous data streams 

afforded by remote sensing technologies like ERTS into information convenient 

for regional, state, and national resource management agencies. For almost 

a decade how, research has progressed toward the application of statistical 

pattern recognition techniques within operational systems for automatic image 
4 5 

interpretation. * To date, however, the sheer magnitude of the data processing 

required and the efficiencies of available computers have severely impacted 

the cost-effectiveness of such procedures outside research environments. The 

present research has been undertaken to determine the extent to which this 

condition is changing due to emerging information processing and communications 

6 7 

technologies such as the ILLIAC IV parallel computer * and the nationwide 
ARPA Network . 8 ’ 9 

In this document we report research determinations to date concerning 
the efficiencies of multispectral earth resources data interpretation on the 
ILLIAC IV. Algorithms now operational on ILLIAC IV for analysis of ERTS MSS 
data are described, and execution times for these procedures are documented. 
General strategies are discussed for implementing data storage and retrieval 
systems, and interactive information management systems appropriate to ILLIAC IV 
image processing capabilities. Additionally, we discuss the potentials of the 
ARPA Network as a mechanism for interfacing a geographically-dispersed com- 
munity of users to an ILLIAC IV multispectral Image interpretation facility. 

The Multivariate Analysis Approach to Earth Resources Data Interpretation 

Procedures developed to date for ILLIAC IV interpretation of ERTS MSS 
imagery represent parallel-computation implementations of a subset of the 
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multivariate analysis techniques previously researched at LARS in serial- 
computation mode for interpretation of multispectral earth resources data. 

(For an introductory mathematical description of multivariate methods of multi- 
spectral image interpretation as applied at LARS see SWAIN. The following 
verbal description of the methodology pursued for ILLIAC IV interpretations 
is offered in the interest of internal completeness for the present report. 

The multivariate analysis methodology of multispectral image interpre- 
tation assumes that each earth terrain type of interest (agricultural, natural 
resource, land use, etc.) reflects, absorbs, and emits solar energies of 
various wavelengths in characteristic proportions. The model may be considered 
a highly sophisticated mathematical generalization of the process by which we 
recognize various landscape features according to color. Computer implementa- 
tions seek to interpret entire images by independent classification of all 
image resolution elements into terrain categories according to the spectral 
data recorded for each element. Thus, while the output of such a procedure 
may be an interpreted terrain map, no spatial pattern recognition theory need 
be employed. 

The methodology proceeds by representing all individual resolution 
elements within an image as independent, multivariate observations that may 
be grouped into meaningful terrain categories according to the multivariate 
distributions of spectral observations within each category of interest. Thus, 
the methodology assumes that the terrain categories to be recognized and mapped 
have spectral characteristics sufficiently dissimilar across the wavelength 
bands recorded that the multivariate distributions of observations within each 
category are statistically separable and distinct. Where terrain categories 
of interest have distinct spectral profiles, several approaches to computer- 
assisted image interpretation are possible. 
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The supervised classification approach assumes that all image 
resolution elements corresponding to particular terrain categories can be 
characterized with reference to the spectral properties of known samples of 
each terrain type. Given spectral characterizations for each terrain category 
of interest, resolution elements throughout an image are classified into terrain 
categories with reference to spectral properties alone. Statistical classifica- 
tion theory may be conveniently employed. Given ground truth information 
identifying fixed sets of resolution elements representative of each terrain 
category of interest, statistics are computed to estimate the multivariate 
distribution of spectral properties for all resolution elements corresponding 
to each terrain type. An entire image may then be automatically interpreted 
in point-by-point fashion by statistical classification of each image resolu- 
tion element in accordance with the maximum of the discriminant function values 
computed for all terrain categories. 

An alternative approach to computer-assisted interpretation relies 
on multivariate cluster analysis of image resolution elements. Following this 
approach, resolution elements comprising an area within an image are analyzed 
with respect to spectral properties, and assigned to data clusters such that 
image elements within each cluster have similar spectral properties while 
elements of different clusters have dissimilar characteristics. Since the 
multivariate methodology, in general, presupposes the existence of distinct 
spectral profiles for terrain categories of interest, there should exist some 
simple correspondence between resulting clusters and terrain types. A partic- 
ular terrain classification schema is established in an a posteriori fashion 
by relating cluster analysis results to available ground truth information. 

Since, by this method, no a priori classification schema is assumed for machine 
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categorization of input data, the cluster analysis approach may be considered 
an unsupervised classification strategy. 

Multivariate statistical classification and cluster analysis tech- 
niques may be employed in a variety of ways together within computer-assisted 
multispectral image interpretation. Even where terrain categories to be 
recognized are prespecified and ground truth information has been collected 
as training data for supervised classification, training samples may be cluster 
analyzed first to determine the existence of multiple spectral profiles for 
image elements of single nominal categories. 

A combination of cluster analysis and classification methods is also 
advantageous where data is to be interpreted that corresponds to large regions 
for which little is known about the spectral distinctness of terrain types# A 
systematic sample of all data may be cluster analyzed to determine spectrally 
separable terrain clusters. Using statistics computed for such clusters, all 
resolution elements within the region may be classified into the terrain types 
represented by all clusters. Once such a classification is mapped, a corre- 
spondence between the spectrally determined clusters and specific terrain 
categories can be established. 

11 

ELLEFSEN, SWAIN, and WRAY used such a combination of cluster analysis 
and classification methods at LARS and have reported some success in producing 
large-scale regional land-use maps for the San Francisco Bay area from ERTS MSS 
data. Eight urban terrain categories and three bordering nonurban terrain types 
were identified and mapped with considerable accuracy. Results of smaller test 
analyses conducted at CAC using the same ERTS data tend to conform to LARS 
results with respect to identifiable terrain categories. 
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ILLIAC IV Efficiencies for Multispectral Image Interpretation 

Due to the central roles of the statistical classification and cluster 
analysis algorithms within the LARS methodology of multispectral image interpre- 
tation, parallel— computation Implementations of these two procedures have been 
developed at CAC to provide basic ILLIAC IV analysis capabilities with which 
the efficiencies of the ILLIAC IV for multispectral image processing can be 
evaluated. While both procedures involve very large amounts of computation 
for only modest quantities of multispectral data, both algorithms have parallel 
structures and hence their implementations on ILLIAC IV prove quite efficient. 

The fact that 32-bit arithmetic operations are sufficient for numerical 
accuracy within both of these algorithms makes possible additional efficiencies 
for ILLIAC IV executions. 

The multivariate cluster analysis algorithm employed at LARS follows 
the ISODATA technique of BALL and HALL 1 ^ * ^ with modifications similar to those 
suggested by SWAIN and FU. 1 ^ A description of the algorithm as currently 
implemented at LARS can be found in SWAIN. 10 The ILLIAC IV implementation of 
this algorithm has been documented by THOMAS. 15 

The point-by-point classification methodology used by LARS represents 
a straightforward application of the maximum likelihood decision rule of statis- 
ical classification theory, assuming multivariate normal distributions for image 
elements within each category. SWAIN 10 describes the LARS implementation of this 
statistical classification technique. THOMAS 1 ^ documents the corresponding 
ILLIAC IV implementation of this decision rule for ERTS data interpretations. 
Efficient utilization of the parallel structure of ILLIAC IV calculations is 
achieved by classification of 128 image resolution elements simultaneously. 

To evaluate the potential efficiencies of ILLIAC IV multispectral 
image interpretations with respect to both the cluster analysis and classification 
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algorithms, comparative tests have been made between ILLIAC IV execution times, 
and times reported by the IBM 360/67 of LARS, the DEC PDP-10 of the Information 
Sciences Institute (ISI) at Marina Del Rey, and the ILLIAC IV simulation system 
of the Burroughs 6700 at the University of California at San Diego. 

Both simulated and actual ILLIAC IV timings were obtained for these 
experiments since the currently available execution timing mechanisms of the 
ILLIAC IV itself were not sufficiently accurate for meaningful comparisons 
where only a few seconds of execution time were involved. Comparative PDP-10 
times were obtained to indicate the relative speed of backup ERTS data Interpre- 
tation on the PDP-10 processors associated with the ILLIAC IV system at Ames 
Research Center and other interactive PDP-10 computers elsewhere on the ARPA 
Network. The LARS 360/67 computer was selected for comparison since it 
represents a facility dedicated to multispectral image processing. Also, LARS 
output could be used to validate the correctness of all other analysis results. 

Earlier, RAY and THOMAS’^ reported the results of comparisons 
between ILLIAC IV simulation timings and actual execution times of the ISI 
PDP-10 and the LARS 360/67 for both cluster analysis and classification algo- 
rithms. As a test data set, a 64-line by 64-column rectangle of 4096 data 
samples was chosen within the ERTS San Francisco Bay image #1003-18175. A 
smaller rectangle of 1280 data samples was taken within this test area as a 
data set for the cluster analysis comparisons. Corresponding classification 
algorithms were executed on all three computers for the complete test area of 
4096 samples. Fortran algorithms were run on the PDP-10 and 360/67 computers. 
Both ILLIAC IV algorithms were programmed in ASK. A summary of the results of 
these experiments is presented on the following page. 
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Table 1. Simulated ILLIAC IV, PDP-10 and 360/67 
Execution Times 

ILLIAC IV PDP-10 360/67 

Cluster Analysis 

CPU Time (secs) 0.029 38.43 14.31 

ILLIAC IV Speed Factors 1325 493 


Classification 

CPU Time (secs) 0.0068 46.80 9.54 

ILLIAC IV Speed Factors 6882 1403 


Since actual ILLIAC IV execution times for these tests were too small 
to be clocked, larger test data sets were selected for comparison of ILLIAC IV 
and LARS 360/67 computations. Within the same ERTS MSS image, a 256-line by 
512-column test area of 131,072 resolution elements was selected for classifi- 
cation. (See Figure 2.) A 1/16 systematic sample of 8,192 resolution elements 
taken every fourth line and every fourth column throughout the test area was 
chosen for cluster analysis comparisons. After cluster analysis of the sampled 
data of 8K elements into thirty-two clusters, all 131K resolution elements of 
the test area were classified into the corresponding thirty-two categories. 
Summary results of these experiments are presented below. 

Table 2.. ILLIAC IV and 360/67 Execution Times for 
ERTS Data Cluster Analysis and Classification 

ILLIAC IV 360/67 

Cluster Analysis 

CPU Time (sec) 25 (±2) 3602 

Classification 

CPU Time (sec) 6 (+2) 1464 


Again, ILLIAC IV classification of core memory-contained ERTS MSS data 
is too fast to be meaningfully clocked with existing timing mechanisms. The 
results obtained, however, would indicate an ILLIAC IV-360/67 speed factor for 
classification closer to two orders of magnitude rather than the three orders of 
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Figure 1. Photo reproductions of 
the four spectral bands of ERTS-1 
image #1003-18175 selected as a 
test data set for ARPA Network- 
ILLIAC IV system development work. 
This imagery was taken over San 
Francisco Bay on 26 July 1972. 





Figure 2. A line printer grayscale display of the 256-line by 512-column 
image area selected for comparison of the ILLIAC IV and IBM 360/67 execution 
times for both the cluster analysis and statistical classification algorithms 
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magnitude suggested by the simulation timings. Similarly, the results of the 
larger cluster analysis comparison would indicate an ILLIAC IV-360/67 speed 
factor on the order of only 144 • We explain the discrepancy between actual 
factors and factors derived through simulation in the following manner: our 

simulations assumed ILLIAC IV operation in instruction overlap mode at an 
internal clock rate of 16 megahertz; the reported analyses were executed at 12 
megahertz without instruction overlap . (The speed of the cluster analysis algo- 
rithm as programmed is more sensitive to the absence of instruction overlap due 
to the large number of data exchanges, i.e., routes, between processing elements.) 
As instruction overlap operation mode and greater internal clock rates become 
available for ILLIAC IV application programming, we expect actual and simulated 
execution timings for these two algorithms to be more in agreement. 

Remote Multispectral Image Processing via the ARPA Network 

Within the scope of activities summarized in this report, considerable 
research has been conducted at CAC during the last twelve months to assess the 
potentials of the ARPA Network as a means to decentralize ILLIAC IV image pro- 
cessing capabilities, and to share the resources of ERTS data analysis systems 
now being developed. For almost all computational support for research activities, 
CAC relies on the variety of computing facilities accessible to it remotely via 
the ARPA Network. Of wider concern is the fact that almost all access to ILLIAC IV 
image processing capabilities for all users will be provided by the network. 

Hence, we summarize here CAC networking experience within present image processing 
activities. 

Concerned initially with the development of basic software for simula- 
tion of alternative ERTS data management and processing systems that might be 
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implemented at Ames Research Center using the UNICON Data Computer and peripheral 
PDP-10 processors in conjunction with ILLIAC IV processing, CAC has developed an 
interactive ERTS data analysis system that is now operational on a number of 
PDP-10 computers on the network. 

Designed to be addressed through low-cost portable computer communica- 
tions terminals, the system allows interactive selection of rectangular image 
analysis windows from ERTS data tapes, gray-scale display of the raw data within 
these windows, interpretation via both cluster analysis and classification tech- 
niques, and printed character display of interpreted windows. Such interactive 
windowing allows convenient delineation of image resolution elements corresponding 
to areas of ground truth Information. Windows may also be retrieved as systematic 
geographic samples of larger rectangular ground areas. The nature of portable 
terminals restricts use of the system to small scale data analysis. On the other 
hand, sufficient features have been incorporated into the system that the Statis- 
tical Reporting Service (SRS) of USDA plans to use the system experimentally to 
analyze small areas within ERTS images corresponding to agricultural test areas 
already established for SRS remote sensing research. SRS will access the ARPA 
Network through the node facilities of the National Bureau of Standards in 
Gaithersburg, Maryland. 

Considerable experience has also been gained with network transmission 
of larger quantities of ERTS MSS data. Raw and interpreted ERTS data files on 
the order of 5 megabits corresponding to line printer displays approximately 
3' by 5' are now routinely transmitted between sites on the network in less 
than fifteen minutes. This experience suggests that where terminal facilities 
permit, multispectral image processing can be accessed remotely over the network 
with essentially RJE terminal convenience. 
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Access to the ILLIAC IV and numerous other computational facilities 
via the ARPA Network has allowed CAC to develop considerable experimental soft- 
ware for graphical display of raw and interpreted ERTS MSS data. Using remote 
processing and terminal output devices at CAC, software has been developed for 
gray-scale line printer displays of ERTS imagery and geometrically corrected 
line printer maps of ERTS data interpretations. Experimental drum plotter 
software has been developed for plotting interpreted ERTS data maps at scales 
corresponding to USGS 7 1/2 minute quadrangles. Additionally, a modest research 
effort has involved interactive display of raw and interpreted ERTS data on CRT 
devices at CAC- (See Figures 1-6.) 

Recommended ILLIAC IV Multispectral Image Analysis Systems 

Experience with the LARS cluster analysis and statistical classifica- 
tion algorithms has indicated that ILLIAC IV batch interpretations should be 
close to two orders of magnitude more cost-effective than comparable processing 
on the other large-scale computers. 

Experience with portable terminal, line printer, plotter, and CRT 
graphics software now developed at CAC for ILLIAC IV-ARPA Network multispectral 
image processing has demonstrated the potential of the network as a mechanism 
for decentralizing access to ILLIAC IV data analysis capabilities. 

Therefore, we recommend continued research and development of more 
comprehensive ILLIAC IV multispectral image analysis systems. Research to date 
suggests that work in the immediate future is warranted in the following areas: 
Using the UNICON Data Computer and the PDP-10 peripheral computers of 
the ILLIAC IV system, software should be developed for remote interactive 
maintenance and manipulation of the numerous data files associated with multi- 
spectral image interpretation. Additional software should be developed for 
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Figure 3. A line printer display 
of a portion of the test area 
interpreted by the ILLIAC IV 
using the cluster analysis and 
statistical classification algo- 
rithms. Here, the interpreted 
data has been geometrically re- 
formatted to allow direct overlay 
of computer print-out with U.S.G.S. 
quadrangle maps of the same geo- 
graphic area. 



Figure 4. Example Polaroid photos by ERTS-1 data displayed on an IMLAC CRT 
device at CAC. The data displayed corresponds to the Chicago Loop area 
visible in ERTS-1 image #1007-16093. The bottom-right photo represents a 
color-coded interpreted data map achieved using digital color separation 
techniques and multiple color film exposures. 
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Figure 5. The Hayward quadrangle 
of the U.S.G.S. 7 1/2 1 topographic 
map series. The geographic area 
of this map falls completely within 
the test area chosen for ILLIAC IV 
analyses. (See Figure 2.) 




Figure 6. A computer-generated 
map showing current predominant 
terrain types for the Hayward 
quadrangle area. The map displays 
ERTS-1 data that has been inter- 
preted on the ILLIAC IV and pro- 
cessed further for color mapping 
using the Zeta drum plotter at 
CAC. 
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interactive retrieval and display of portions of imagery corresponding to 
analysis areas conveniently specified by users in terms of latitude and longi- 
tude boundaries. Using low-cost portable terminals and pen digitizers, inter- 
active software should be developed for convenient specification of matched 
control points between multispectral images and maps, thus facilitating 
accurate geographic registration of archived data. Given accurate geographic 
referencing of data and Interpretations, other systems could be implemented for 
interactive pen-to-image or pen-to-map information retrieval, delineation of 
ground truth training sample areas, and occasional digitization of irregular, 
geographic networks and boundaries for display within raw and interpreted data 
maps. 

Comprehensive 1LLIAC IV software should be developed for drum plotter 
and film scanner display of multispectral data interpretations in map format. 
Included should be capabilities for plotting, at any scale, color-interpreted 
maps of any north-south rectangle within an image. Emphasis should be placed 
on plotting terrain maps equivalent with respect to area, scale, and map projec- 
tion to USGS topographic maps of the 7 1/2', 15’, and 1° x 2° quadrangle series. 

ILLIAC IV system capabilities should be implemented for aggregation 
of interpreted data in cellular subdivisions by various geographic referencing 
systems, including latitude-longitude, UTM, square kilometer, and square mile 
units. System capabilities for accurate registration of imagery with base maps 
and for aggregation of data interpretations by grid locations will then allow 
efficient computer analysis and display of regional land use and natural resource 
boundary changes over time. 

As time and resources permit, other research should concern develop- 
ments and implementation of additional ILLIAC IV algorithms for multiple image 
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registration and terrain classification with respect to seasonal change. . Addi- 
tional research should explore the problems and advantages associated with 
ILLIAC IV analysis of digitized U-2 film imagery as supplementary, high- 
resolution multispectral data sources. 
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